Add browser-reverse skill — OpenAPI 3.1 from browser-trace captures#88
Conversation
…aptures Consumes a browser-trace run (.o11y/<run>/), pairs CDP request/response events, templatizes paths, infers JSON schemas from samples, and emits an OpenAPI 3.1 document with a coverage report and confidence metadata. Pipeline: load → filter → normalize → infer → emit. Each stage is a discrete script writing to intermediate/ for debuggability. Optional --bodies <path> flag joins a `browse network on` capture by CDP requestId so response bodies feed into schema inference. E2E tested against Hacker News, jsonplaceholder, derekmeegan.com, browserbase.com, browser-use.com, reddit.com. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shrey150
left a comment
There was a problem hiding this comment.
Some things to fix or address before rereviewing
|
|
||
| ``` | ||
| browser-trace → .o11y/<run>/cdp/network/{requests,responses}.jsonl | ||
| discover-api-spec → .o11y/<run>/api-spec/openapi.yaml + report.md |
There was a problem hiding this comment.
Is this skill actually defined anywhere in this PC?
|
|
||
| `discover.mjs` auto-detects `<run>/cdp/network/bodies/`. To use a body capture from elsewhere (e.g. didn't snapshot, want the live `browse network` dir), pass `--bodies <path>` explicitly. | ||
|
|
||
| Then deliver the artifacts to the user (`exec.sendFile()` for `openapi.yaml` and `report.md`). |
There was a problem hiding this comment.
exec.sendFile() is for bb not for general use right?
There was a problem hiding this comment.
this was from a claude memory i think lol...
| @@ -0,0 +1,118 @@ | |||
| # Adding Response Body Capture to `browser-trace` — Lift Estimate | |||
There was a problem hiding this comment.
Is this plan mode slop lol
| @@ -0,0 +1,6 @@ | |||
| { | |||
| "name": "browser-reverse", | |||
There was a problem hiding this comment.
I would still like a different name, like /discover-api or /browser-to-api or /website-to-api
| @@ -0,0 +1,240 @@ | |||
| # Browser Reverse — Reference | |||
There was a problem hiding this comment.
I'm not sure this is what a REFERENCE.md file should be - it should exhaustively describe all commands used by the skill, I would maybe recommend removing the Pipeline portion here
Renaming and doc cleanup (per shrey150): - Rename skill from `browser-reverse` to `browser-to-api`. Updates SKILL.md frontmatter + heading, package.json, REFERENCE.md heading, the OpenAPI doc's `info.description`, and the report.md heading. - Fix the stale `discover-api-spec` reference in SKILL.md's composition diagram (left over from an earlier rename). - Drop `BODY-CAPTURE-LIFT.md` from the PR; it's a separate proposal. - Remove the `exec.sendFile()` reference in SKILL.md (browserbase-internal, not a generic skill primitive). - REFERENCE.md restructured to lead with the script/CLI/file-format reference rather than an architecture intro. Pipeline diagram dropped. Bug fixes (per Cursor Bugbot): - `filter.mjs`: rework precedence so `--include` actually rescues URLs that would be hit by a default exclude, matching the documented contract. User `--exclude` still wins. Added a unit-style test path. - `infer.mjs`: skip response-body samples whose CDP status is null. Previously they were keyed under `"0"` but `emit.mjs` only iterates `ep.statusCodes` (which excludes nulls), silently discarding the body. - `load.mjs`: fix the comment in `urlQuery()` — code is first-value-wins, not last-value-wins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed 9446f91 addressing all review comments:
@cursor[bot]
Branch name is still |
|
|
||
| function isKeySecret(name) { | ||
| const k = String(name).toLowerCase().replace(/[_-]/g, ''); | ||
| return KEY_DENY.has(k) || extraKeys.has(k); |
There was a problem hiding this comment.
Extra redaction keys silently fail for body matching
Medium Severity
isKeySecret normalizes the input name by stripping underscores and hyphens via .replace(/[_-]/g, ''), but extraKeys stores user-provided --redact values with only toLowerCase() applied — no underscore/hyphen stripping. A user passing --redact my_secret_key stores my_secret_key in extraKeys, but the lookup normalizes the JSON key to mysecretkey, which never matches. User-specified body key redactions containing _ or - are silently ignored, potentially leaking credentials the user explicitly asked to scrub.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 9446f91. Configure here.
normalize.mjs: - Auto-classify endpoints as api/noise/page and drop non-API traffic (tracking, analytics, bot defense, session plumbing, HTML page renders) - Detect multiplexed endpoints (GraphQL operationName, JSON-RPC method, query param dispatch) and decompose into separate logical operations - Typically drops 60-80% of captured traffic as noise emit.mjs: - Generate client.mjs — zero-dependency ES module wrapping each discovered operation as an async function with JSDoc param types - For GraphQL/APQ endpoints, embeds persisted query hashes and wires up the full request shape so callers just pass variables - Extract required headers from trace (CSRF tokens, custom headers) and include them in client defaults - Task-oriented report.md with quick-start import, curl examples, variables tables, and response samples per operation On OpenTable trace: 27 raw endpoints → 9 named operations, zero noise. Generated client with autocomplete(), restaurantsAvailability(), etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| const NOISE_PATH_PATTERNS = [ | ||
| // Tracking / analytics / telemetry | ||
| /\/track(ing)?[\/\b]/i, /\/pixel/i, /\/beacon/i, /\/log[\/\b]/i, | ||
| /\/impression/i, /\/pageview/i, /\/click[\/\b]/i, |
There was a problem hiding this comment.
Regex \b in character class matches backspace, not word boundary
Medium Severity
Several NOISE_PATH_PATTERNS use [\/\b] intending to match a slash or word boundary. However, \b inside a character class [...] matches the backspace character (U+0008), not a word boundary assertion. This affects patterns like /\/track(ing)?[\/\b]/i, /\/log[\/\b]/i, /\/click[\/\b]/i, and /\/experiment[\/\b]/i. Paths ending with these segments (e.g., /api/track, /api/log) without a trailing slash will not be classified as noise and will leak through into the discovered spec.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit dc07d29. Configure here.
| if (hash) lines.push(` ${op.operationName}: '${hash}',`); | ||
| } | ||
| lines.push(`};\n`); | ||
| } |
There was a problem hiding this comment.
HASHES constant redeclared for multiple persisted-query endpoints
Medium Severity
When multiple GraphQL parent paths use persisted queries, the generated client.mjs emits const HASHES = {...} once per parent path. Since all declarations are at the module's top-level scope, the second const HASHES declaration produces a JavaScript SyntaxError, making the entire generated client unusable. The variable name needs to be unique per parent path (e.g., incorporating the path or a counter).
Reviewed by Cursor Bugbot for commit dc07d29. Configure here.
|
|
||
| // Regular REST endpoints | ||
| for (const ep of regular) { | ||
| const fnName = makeOpId(ep).replace(/^(get|post|put|patch|delete)_/, (_, m) => m); |
There was a problem hiding this comment.
Replace callback keeps method name instead of stripping prefix
Low Severity
The replace callback (_, m) => m returns the captured HTTP method name (get, post, etc.) instead of an empty string. This replaces get_ with get (only removing the underscore), producing function names like getv1_items_id instead of the intended v1_items_id. The replacement string should be '' to properly strip the method prefix.
Reviewed by Cursor Bugbot for commit dc07d29. Configure here.
| lines.push(` 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',`); | ||
| for (const [k, v] of Object.entries(observedHeaders)) { | ||
| lines.push(` '${k}': '${v}',`); | ||
| } |
There was a problem hiding this comment.
Unescaped dynamic values injected into generated JavaScript source
Medium Severity
Header values from HTTP traces are interpolated directly into single-quoted JavaScript string literals in the generated client.mjs without escaping. If any observed header value contains a single quote (or backslash, newline, etc.), the generated code will have a syntax error and fail to parse. Values need to be escaped, e.g., using JSON.stringify().
Reviewed by Cursor Bugbot for commit dc07d29. Configure here.
Generates index.html with: - Summary stats (operations, endpoint, protocol, sample count) - Expandable cards per operation with variables table, client usage, request body, and response example - Full generated client.mjs embedded at the bottom The Swagger UI was a poor fit — 10 identical green POST bars for a single GraphQL endpoint with bracket-syntax paths that aren't even valid OpenAPI. The HTML report shows what actually matters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
emit.mjs already generates index.html as the primary visual output — update SKILL.md to match and remove the dead open-swagger-ui.mjs script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
There are 8 total unresolved issues (including 5 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5eee2ab. Configure here.
| <div class="card" id="op-${i}"> | ||
| <div class="card-header" onclick="this.parentElement.classList.toggle('open')"> | ||
| <div class="card-title"> | ||
| <span class="method">POST</span> |
There was a problem hiding this comment.
HTML report hardcodes "POST" for every endpoint card
High Severity
The method badge in buildHtmlReport is hardcoded to POST for every endpoint card, regardless of the actual HTTP method. ep.method is available and used correctly on line 659, but line 650 emits a static POST string. Every GET, PUT, DELETE, and PATCH endpoint in the generated index.html will incorrectly display as POST.
Reviewed by Cursor Bugbot for commit 5eee2ab. Configure here.
| writeJson(path.join(outDir, 'confidence.json'), confidence); | ||
|
|
||
| // report.md | ||
| const redaction = readJson(intermediatePath(outDir, 'redaction-stats.json'), { headers: 0, bodyKeys: 0, bodyValues: 0 }); |
There was a problem hiding this comment.
Redaction stats loaded and passed but never used
Low Severity
The redaction variable is read from disk via readJson on line 297 and passed to buildReport, where it's destructured as a parameter but never referenced in the function body. This is dead code — the redaction statistics (header count, body key count, body value count) are computed and persisted but never displayed in the report.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5eee2ab. Configure here.
|
|
||
| // Mixed types — fall back to a typed union via "type" array (OpenAPI 3.1 / draft 2020-12 OK). | ||
| const out = { type: nullable ? [...nonNull, 'null'] : nonNull }; | ||
| return out; |
There was a problem hiding this comment.
All-null fields produce invalid empty type array schema
Medium Severity
When all samples for a field are null, toSchema falls through every branch to the mixed-types case and produces { type: [] } — an empty type array. This is because nonNull is empty and nullable is false (it requires nonNull.length > 0). The correct output for an always-null field is { type: 'null' }. This produces an invalid JSON Schema fragment in the emitted OpenAPI spec.
Reviewed by Cursor Bugbot for commit 5eee2ab. Configure here.


Summary
browser-reverseconsumes abrowser-tracerun directory and emits an OpenAPI 3.1 spec for the publicly-observable HTTP API of any website, plus a human-readable coverage report and per-endpoint confidence metadata. Pure offline post-processing — composes cleanly with the existingbrowser-traceskill rather than duplicating capture.Pipeline (each stage is a discrete script for debuggability via
--stage):Highlights
{id},{id2}, etc.lib/schema-merge.mjs) — JSON-Schema from samples with required-intersection, type unions, format hints (date-time,uri,email,uuid), and enum detection that requires meaningful repetition (not just low cardinality).$ref— recurses into nested object/array schemas, hoists when referenced ≥ 2 times OR when it's an object with ≥ 4 properties. Names derived from path tokens.Authorization,Cookie,*-token, etc.), in body keys (password,apiKey, etc.), and value patterns (JWTs, emails, phone numbers). Replaces values with<redacted>to preserve types for inference.browse network onintegration — pass--bodies <path>(or stash bodies under<run>/cdp/network/bodies/, which is auto-detected) to join real response bodies into the trace by CDPrequestId. Without it, the spec has request bodies but no response-body schemas (thebrowse cdpfirehose doesn't embed bodies).(method, path), the higher-sample operation wins and other origins are recorded underx-also-served-fromrather than silently dropped.report.mdlists every endpoint with samples, statuses, confidence, and normalization flags (single-sample,single-status,mixed-content-types,divergent-response-shape,request-body-only-on-some-samples).Composition with browser-trace
End-to-end testing
Pipeline ran clean against six sites; five real bugs surfaced and fixed during this work:
integervsstringonidbased on values)200+404, header + body redaction_rscquery param, Vercel analytics body schema, mixed-content-types detection/pixel/{id}/visitor/{id2}/cerebro, 12 components hoisted/api/md/<slug>LLM-friendly markdown export endpoint/svc/shreddit/events(Reddit's internal telemetry, 18 nested types), liveExposeVariantGraphQL exposure capturing experiment namesBugs surfaced and fixed during E2E:
distinct ≤ floor(samples/2)so unique IDs don't become enums.{...}.lengthisundefined; rewrote to useObject.keys(...).lengthand recurse into nested schemas.redactBody()called twice per body; redact once and reuse.@,`,#, etc. as first character now trigger quoting (was breaking on@vercel/analytics/react).paths.<path>.<method>is unique in OpenAPI; higher-sample winner now recorded withx-also-served-fromextension.Files
SKILL.md/REFERENCE.md— skill docs, file format reference, jq recipes, troubleshootingscripts/discover.mjs— top-level dispatcher with--stagefor partial runsscripts/{load,filter,normalize,infer,emit}.mjs— pipeline stagesscripts/lib/{io,redact,path-template,schema-merge,yaml}.mjs— pure helpersBODY-CAPTURE-LIFT.md— design doc for adding native body capture tobrowser-trace(alternative to the currentbrowse network onpairing). Open question for maintainers; no code change in this PR.Test plan
SKILL.mdopenapi.yamlparses with a YAML library (python -c "import yaml; yaml.safe_load(open('...'))")openapi.jsonparses (jq . openapi.json)report.mdcorrectly flags low-confidence endpoints--bodiesflag with abrowse network oncapture and confirm response-body schemas appear in the spec🤖 Generated with Claude Code
Note
Medium Risk
Mostly additive (new
browser-to-apiskill), but it introduces a non-trivial inference/emission pipeline that could generate incorrect specs or leak sensitive data if redaction misses app-specific secrets.Overview
Adds a new
browser-to-apiskill that post-processes abrowser-tracerun into a best-effort OpenAPI 3.1 spec plus artifacts (index.htmlreport,report.md,confidence.json, and a generatedclient.mjs).Implements a 5-stage Node (stdlib-only) pipeline (
load/filter/normalize/infer/emit) including optional joining ofbrowse network onrequest/response bodies by CDPrequestId, URL templating + noise filtering, multiplexed endpoint decomposition (e.g., GraphQLoperationName), schema inference with redaction, and OpenAPI emission with component schema hoisting and cross-origin collision handling.Reviewed by Cursor Bugbot for commit cf3e72b. Bugbot is set up for automated code reviews on this repo. Configure here.